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Abstract — We show that optimal protocols for noisy channel 
coding of public or private information over either classical or 
quantum channels can be directly constructed from two more 
primitive information-theoretic tools: privacy amplification and 
information reconciliation, also known as data compression with 
side information. We do this in the one-shot scenario of structure- 
less resources, and formulate our results in terms of the smooth 
min- and max-entropy. In the context of classical information 
theory, this shows that essentially all two-terminal protocols can 
be reduced to these two primitives, which are in turn governed by 
the smooth min- and max-entropies, respectively. In the context of 
quantum information theory, the recently-established duality of 
these two protocols means essentially all two-terminal protocols 
can be constructed using just a single primitive. 

Index Terms — quantum information, channel coding, privacy 
amplification, information reconciiiation, Siepian-Wolf coding, 
smooth entropies 

ONE of the major trends in information theory, both 
classical and quantum, is that a small set of proof 
techniques can be used to construct a wide variety of protocols. 
Random coding is as ubiquitous as it is useful in classical 
information theory, and the method of decoupling increasingly 
plays a similar role in quantum information theory. Instead 
of reusing proofs, a different approach is to reuse the pro- 
tocols themselves, building up more complicated protocols 
by combining simpler ones. The goal is to do this in such 
a way that the inner workings of the parts do not have to 
be analyzed to ensure the correct functioning of the overall 
protocol. For instance, joint source-channel coding can be 
accomplished by simply combining a data compressor with a 
channel coding scheme 0J. In the quantum realm, the "mother 
of all" protocols, a fully-quantum version of the Slepian-Wolf 
task, can generate a variety of two terminal protocols involving 
entanglement when combined with teleportation and dense 
coding [2J. 

In this paper we construct optimal protocols for commu- 
nication of classical information over noisy channels from 
two simpler primitives: randomness extraction and information 
reconciliation, also known as data compression with side infor- 
mation. The construction works for either classical channels or 
quantum channels explicitly accepting classical inputs, and by 
replacing randomness extraction with privacy amplification, 
we directly obtain a protocol for private communication. 
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We work in the one-shot scenario of structureless resources, 
meaning the coding scheme does not rely on repeated uses 
of a memoryless channel. Rather, the one-shot scenario is 
considerably broader in approach, encompassing not only 
channels in the traditional sense of communication (both with 
and without memory), but also channels as models for the 
dynamics of a physical system, for which the memoryless 
assumption would be out of place. 

Besides adopting a new technique to construct the protocols, 
the resulting capacity expressions are novel as well. We find 
that the capacities of a channel for one-shot public Q or private 
communication can be characterized in terms of smooth con- 
ditional min- and max-entropies, introduced and characterized 
in Q, H, 0. Furthermore, these expressions are shown to 
be essentially tight, up to small additive terms. Appealing 
to the asymptotic equipartition property (AEP) for smooth 
entropies [6| allows us to quickly recover the usual capacity 
expressions in the memoryless case, from Shannon's original 
result on the capacity of the classical channel for public 
communication |7| and the associated capacity for private 
communication (8), ifTOl . to the capacity of a quantum 
channel for public classical communication (known collo- 
quially as the Holevo-Schumacher- Westmoreland, or HSW, 
Theorem) ifTTI . Ifl2ll as well as for private communication |fl3l . 
Furthermore, dividing the problem of noisy channel commu- 
nication into questions of coding and questions of channel 
properties considerably simplifies the logical arguments and 
should be of independent pedagogical value. 

One-shot expressions for the capacity of public communi- 
cation have been derived before. In |[T4ll the one-shot capacity 
of a classical channel was characterized in terms of smooth 
min- and max-entropies, while [15| derives an expression 
for the capacity of quantum channels in terms of general- 
ized (Renyi) relative entropies following a hypothesis-testing 
approach. These results can be seen as generalizations of 
earlier (asymptotic) results based on the information spectrum 
method lT6ll . IfTTI . Very recently, 0~8| finds tight bounds on 
the capacity in terms of a smooth relative entropy quantity 
again from a hypothesis-testing approach. Combining the latter 
results with those here implies a relation between the smooth 
relative and conditional entropies. 

Both of the primitive tasks used here are designed to 
manipulate "static" resources, in the sense that the goal is to 
transform randomness shared by distant parties into a different 
form (so that the joint distribution of the values held by the 

'We use 'public communication' to refer to the usual task of sending 
classical information, in order to better distinguish it from the case of private 
communication. 
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parties is close to a given one). This task should be performed 
using local operations and a limited amount of communication. 
In particular, randomness extraction corresponds to the task of 
generating uniformly-distributed random variables out of non- 
uniform inputs, while information reconciliation uses classical 
communication to correlate, or reconcile, a random variable 
held by one party (Alice) with that held by another (Bob). 
One can view the classical data transmitted for this latter task 
as a compression of Alice's random variable, as it can be 
decompressed by Bob with the help of his random variable 
(or quantum system). 

Intuitively, it is plausible that the static information recon- 
ciliation protocol could be adapted to enable reliable com- 
munication over a noisy channel, a "dynamic" resource, in 
the following manner, depicted schematically in Figure [T] 
Assuming uniform distribution of the channel inputs, consider 
the random variables describing the input X and output Y 
(where the latter may be quantum in the case of a quan- 
tum channel). As described above, information reconciliation 
enables Bob to reconstruct X from the compressed version 
of it, C, along with his information Y. Now suppose Alice 
and Bob agree on a particular C = c* in advance for the 
communication task, in that Alice restricts her channel inputs 
X to those with compressed output c*. Upon receipt of Y, 
Bob can reconstruct X by simply reusing the decompressor 
of the information reconciliation protocol. In this way, each 
information reconciliation protocol defines a channel coding 
scheme: every value C = c specifies a channel code consisting 
of all the possible inputs which compress to that value c. 
Since we assumed uniform distribution of the channel inputs, 
this coding scheme is generally not optimal. However, by 
running a randomness extractor backwards we can create the 
optimal input distribution from a uniform one, circumventing 
this problem. This is similar to a method used by Gallager fl9l 
and later expanded in |20l . |2ll . 

The remainder of the paper is devoted to making this 
intuition rigorous. We begin in the next section by formally 
specifying the problem and stating the results in Theorem [T] 
We then move immediately to the proof of the direct part, 
achievability, in Section |TT] and the converse in Section III In 
Section [IV] we show how the usual results may be quickly 
recovered for the case of very many uses of a memoryless 
channel. We conclude in Section |V]by discussing applications 
of this result and its relation to other work. 

I. Definitions and Results 

We work directly with a classical-quantum channel taking 
input classical symbols x £ X to output quantum states i?^ € 
S(W Y ) = y. To recover the case of a classical channel, one 
can simply require that the output states -d^ be simultaneously 
diagonalizable. 

We note that the restriction to classical channel inputs can 
be made without loss of generality as we are interested in 
transmitting classical information, either publicly or privately. 
In terms of a physical channel accepting quantum inputs, this 
just amounts to fixing the quantum states to be input for 
given classical value x; here this choice is effectively part of 



the channel (this is possible since, in the one-shot treatment 
adopted here, the channel is only used once). On the other 
hand, one may regard this choice as part of the encoder, and the 
only necessary modification of the expression for the capacity 
(see Theorem [T] below) would be to include an optimization 
over this choice. 

An (n,e)-coding scheme for a classical-quantum channel 
consists of an encoder Enc : A4 — > X taking classical mes- 
sages m E M. to channel inputs and a decoder Dec : y — > M. 
taking channel outputs to guesses of the input messages, for 
which n = log 2 \M\ and 

p crror (m) = Pr[m ^ Dec o o Enc(m)] < e (1) 

for all m 6 M.. In addition, if 9 outputs a bipartite state d YZ , 
of which Bob receives the Y subsystem and an eavesdropper 
Eve the Z subsystem living in S(H Z ) = Z, then an (n, e)- 
private coding scheme is an encoder-decoder pair as above, 
with the additional requirement that every message m be 
approximately unknown to the eavesdropper: 



Psecret(m) 



\fi Z 



(2) 



bnc(m 



where d z = ~ ^ r 

It is useful to think of the message as being a random 
variable M, taking values in A4 according to the probability 
distribution Pm- Then the message transmission process is 
encapsulated by the following sequence of random variables 
(Markov chain). We use a prime to denote a random variable 
or quantum system which is meant to be nearly identical to 
the unprimed version; in the present context we would like 
the output M' to be essentially equal to the input M, 
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Enc Dec 

Given a choice of e, the capacity for public communication 
(usually just referred to as the classical capacity) Cp ub (0) 
of the channel is simply log 2 n for the largest n in an 
(n,e) coding scheme. The private capacity Cp rv (0) is defined 
similarly using private coding schemes. Here we prove the 
following upper and lower bounds on these capacities in terms 
of the smooth min- and max-entropies, which are defined in 
the appendix. 

Theorem 1 (Capacities of Classical-Quantum Channels). 

For all e > 0, 



Cpub(6>) > max 



H e Z{X)-HUL{X\Y) 



(-IE 

'-'pub 

ForT -)• X 



-41ogi - 16 
H min (X) - H^(X\Y) 
(Y,Z) a Markov chain, 



(0) < max 

Px 



(4) 
(5) 



C„rv(0) > max 



C: 
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h:Z(t\z)-h^ k (t\y) 
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H^(T\Z)-H^(T\Y) 



(6) 
(7) 



In fact, $ need not be the average state, but can be arbitrary. 
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II. ACHIEVABILITY 

The proof of the direct parts proceeds in three steps, 
successively building up to a construction of an encoder and 
decoder. The first step is to show that in doing this, we 
only need to worry about the average transmission error and 
secrecy of the communication scheme, assuming the inputs 
are uniformly distributed. Then, we show that protocols for in- 
formation reconciliation can be adapted to the channel coding 
scenario, when the input to the channel is uniformly distributed 
(not just the messages themselves). Finally, we show how to 
mimic any particular channel input distribution from a uniform 
distribution by using randomness extraction, and how to mimic 
an input distribution so that the eavesdropper learns nothing 
about the message using privacy amplification. 
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Fig. 1. Schematic of using randomness extraction and information reconcil- 
iation (data compression with side information) to perform noisy channel 
communication. Messages m £ M are input to the encoder Enc' and 
subsequently to the shaper Shp, which is a randomness extractor run in 
reverse. Then they are then transmitted over the channel to the receiver, 
who uses the decoder Dec to construct a guess m' 6 M' of the original 
input. Concatenating the shaper and channel gives a new effective channel 
0', for which an encoder/decoder pair (Enc', Dec) can be constructed by 
repurposing a compressor/decompressor pair that operates on the joint input- 
output UY of the channel. Ultimately, the shaper can instead be regarded as 
part of the encoder Enc, which is formed by concatenating Enc' and Shp. 



A. Average Case Coding Implies Worst Case Coding 

We start by observing that constructing an encoder/decoder 
pair with low average error probability on the receiver's end 
and low average trace distance of eavesdropper outputs suffices 
to construct an encoder/decoder pair with low error probability 
and secrecy parameter in the worst case. 

Lemma 1 (Average Case to Worst Case Error). Given a 
channel : X — > Y and an encoder/decoder pair Enc : M — > 
X, Dec : Y -t M' such that ^ J2meM Perror(m) < f 
and mr X)meA'tPsccrct(ra) < % then there exists an en- 
coder/decoder pair for a subset M* C M. of size at least 
\M\j1 such that p cxmT (m*^ < £ and p S ecret(w*) < s for all 
m* <= M*. 

Proof: By the Markov inequality, a fraction at most one- 
quarter of m 6 M. have p CIIOI (m) > e. Similarly, for at 
most one-quarter does p S ocrct("T.) > £■ Thus, there is a subset 
A4* of half the m e M for which neither statement is true. 
Restricting the input of Enc to M* gives the new encoder. 
The new decoder is given by altering the old decoder so that 
outputs m ^ ftA* are mapped at random to rn G A4*. ■ 



Remark 1. If we only require that p CTmT (m) < e, then 

\M\ EmeM terror M < § suffices. 

B. Channel Coding From Information Reconciliation For 
Uniformly-Distributed Inputs 

Now we show that an information reconciliation protocol 
can be adapted to channel coding, at least when the input to 
the channel is uniformly or nearly uniformly distributed. We 
do this explicitly for the case of linear compression functions 
and subsequently remark how it can be made more general. 

First we need to specify classical-quantum information rec- 
onciliation protocols more precisely. Given a classical random 
variable X and a quantum system Y jointly described by 
the classical-quantum state ip XY = J2 x ex Px\ x )( x \ X ® > 
an e-good information reconciliation protocol consists of a 
compression map Cmp : X — > C taking X to another 
classical random variable C and a decompression map Dcp : 
(C,y) —> X taking C and states in the system Y to elements 
X' of the input alphabet X such that the error probablility 
Pcnor = Y. x Vx?A x + Dcp(Cmp(a;), d Y x )\ < e. If the 
alphabet X forms a linear space, the compression map could 
be linear, and one speaks of a linear compressor. 

Lemma 2 (Channel Coding from Information Reconciliation 
of Uniform Inputs). Given a cq channel 9 : U — » y 
from uniformly- distributed inputs U to arbitrary outputs Y, 
suppose the linear compressor and arbitrary decompressor 
pair Cmp/Dcp form an e-good information reconciliation 
protocol for the combined input and output UY. Then there 
exists a linear encoder Enc : M. — > U and a decoder 
Dec : y — > jVf for G such that the error probability 
of transmitting a uniformly-distributed message M of size 
\M\ = \14\/\C\ is also less than e. 

Proof: Start by defining p G rror(u) = Pr[w ^ 
Dcp(Cmp(w), d u )]. Then the information reconciliation error 
probability can be formulated as 

ii:Cmp(u)- 



Pc 



\u\ 



\c\ 

r(w)>Fr/, 0=c : 



Pc 



r(«) 



(8) 



where (X) p x denotes the average of X using the distribution 
Px and Pjj is the uniform distribution. In other words, the 
average error probability is the average over outputs c of the 
average error probability of inputs u consistent with a given 
output. In this expression we have split the summation over u 
to first a summation over the values of c and then for each of 
these a summation over the u for which Cmp(ti) = c. In so 
doing, we have used the fact that there are |W|/|C| preimages 
for each c, which follows from Lemma [3] (see appendix). 

Choosing the value of C = c* with the lowest error 
probability (p or ror(u))p [/|c=c , enables us to define an encoder 
and decoder from the compressor and decompressor restricted 
to this value. The encoder simply maps meXto those u £ U 
for which Cmp(u) = c* in some fixed order, say lexicographic 
order. By linearity of the compressor, \A4\ = \U\/\C\. The de- 
coder is then defined by taking the output of the decompressor 



4 



and then applying the inverse of the encoding map to the result 
v! or outputting a random m E M when Cmp(u') ^ c* . 

The error probability for the encoder/decoder combination 
for uniformly distributed messages M is exactly the same as 
the error probability for the compressor/decompressor combi- 
nation, which must be lower than the average by construction. 

■ 

Remark 2. // the compressor/decompressor pair has error 
probability £2 when acting on a nearly uniform input U' 
satisfying ^\\U — U'\\ 1 < E\, then applying the corresponding 
encoder/ decoder to a uniform input gives an error probability 
of at most E\ + £2 by the triangle inequality. 

Remark 3. By using Lemma |4] instead of Lemma [i] (see 
appendix), the restriction to linear compression functions can 
be removed at the cost of reducing the number of messages 
by a factor e and an additional failure probablity e. 

C. Distribution Shaping 

Finally, we need to remove the restriction of uniform inputs 
to the channel. This is done by combining the channel with a 
distribution shaper, which is a means of mapping a uniform 
distribution to a chosen distribution. By running the distribu- 
tion shaper and then the channel, we obtain a virtual channel 
which acts again on a (roughly) uniformly distributed input. 
The distribution shaper can be constructed using a randomness 
extractor, as follows. 

Suppose that Ext : X —> U' is a function which produces 
an £-good approximation of a uniformly distributed random 
variable U from an input X distributed according to Px, 
in the sense that — U'^ < e. The extractor defines 

a joint distribution Pxu 1 , an d with this we can define a 
function Shp : (U' , R) — > X which is in some sense the 
inverse of Ext. Here R is some additional randomness, and 
Shp is defined by using R to select an x £ X from the 
distribution Px\u'=u' given the input value U' = u' . Thus, 
the output of the shaper is again X. Shapers constructed in 
this manner will be called £-shapers. Moreover, if the extractor 
performs privacy amplification of X against some Z generated 
from X, then the shaper replicates X while hiding U' from 
the eavesdropper. This follows because the conditional states 
relevant to the eavesdropper are the same in both cases. 

It may seem strange to additionally require a source of 
randomness for this purpose, and ideally we would like all 
the randomness needed to generate X to be contained in U'. 
However, the mapping that takes a general X to a nearly- 
uniform distribution U' may map two values x to the same 
u'. When that v! is input to Shp, some randomness is needed 
to reverse the mapping. 

D. Putting it all together 

Now we can combine these three pieces to establish the 
direct part of Theorem [T] We do this first for the private 
capacity and then make some modifications to obtain the lower 
bound on the classical capacity. The latter can be obtained as 
a special case of the former by assuming the channel does not 



leak anything to an adversary, i.e. Z is trivial, but the additional 
modifications will improve the constants in the bound. 

For a given channel : X — > Y and input distribution Px, 
we can define a new channel Q' : U' — > Y (with output states 
i9jf) by concatenating an £i-shaper Shp that generates X, built 
from a privacy amplification extractor, with and regarding R 
as part of the channel. Next, following Remark [2] we construct 
an encoder/decoder pair Enc'/Dec' from an £2-good compres- 
sor/decompressor for if! U Y — J2u'eu Pu'\u')(u'\ u Cg) $ U Y, 
where p u i is the distribution of the input U' to the shaper 
Shp. When input with a uniformly distributed U, the error 
probability averaged over codewords and choices of code is at 
most £1 +£2, while the average leakage to the eavesdropper is 
at most 2ei. For simplicity, define e = 4max(ei, £2). By the 
Markov inequality, at least three-quarters of the code choices 
have an average codeword error rate below 4(£i+£2) < 2e. By 
the same reasoning, at least three-quarters of the code choices 
have an average p SCC rot less than 8s 1 < 2e. Therefore, at least 
half have both properties. 

Regarding the shaper as part of the encoder instead of part 
of the channel, we can define Enc = Shp o Enc'. Applying 
Lemma [T] we can then make the further adjustments to Enc 
and Dec to simultaneously achieve a worst-case error of 8£ 
and worse-case leakage 8e. 

Finally, we can count how many messages can be reli- 
ably sent using the constructed encoder and decoder. From 
Lemma [2] we have n > log |W| — log |C| — 1. Inserting the 
known results for privacy amplification and data compression, 
Theorems [2] and [3] in the appendix, this becomes 



n>H^(X\Z)-2log 



H e ^{U'\Y) -2 log 



1 
£22 



(9) 



where E\ = £n +£12 and £2 = £21 +£22- Again for simplicity, 
let £jk = e/8. Because U' is a function of X via the extractor, 
the max-entropy cannot increase when replacing X by U' . 
Since we are free to choose any Px in this argument we 
therefore have 



n > max 

Px 



H e ±(X\Z) - H^(X\Y) - 41og i - 16 



(10) 



To complete the argument for the private capacity, note that 
Alice could precede the channel with another mapping from 
T to X, which she is free to optimize. Regarding this as part 
of the original channel in the above argument then leads to 
the desired result. 

The direct part for the channel capacity follows by making 
a few small modifications. First, the Markov inequality is 
no longer needed to ensure the two conditions of private 
communication are satisfied. Here there is only one, and 
certainly there exists an encoding with average codeword error 
probability less than the average over codes and codewords, 
£1 +£2. We then only require Remark [T] rather than Lemma[T] 
to move to the worst-case error 2(ei + £2) over codewords. 
Finally, though in principle it is also possible for Alice to 
precede the channel with a T X mapping, we shall see in 
the converse that this is not necessary. Note also that in this 
context the encoder can dispense with the randomness needed 
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to properly simulate X and just fix a particular value of the 
output, for instance the x with the largest Px=x\c=c* ■ 

III. Converse 

We first prove the converse for the private capacity and 
then modify the argument to establish the converse for the 
classical capacity. Given an (n, e)-private coding scheme, the 
two requirements of the output made by the definition imply 
that H^e(M\M') < and H^(M\Z) > n, the former for 
any distribution of messages and the latter for the uniform dis- 
tribution. The former follows because the trace distance of the 
pair (M, M') to (M, M) is less than e and therefore (M, M) 
is in the v^e-neighborhood of (M, M') (see the appendix; 
the square root is a consequence of the conversion of the 
trace to purification distance). But H mllx (M\M) = and thus 
H^(M\M') < 0. The latter follows because again the ideal 
output, in which Z is independent of the uniformly-distributed 
M and therefore satisfies H min (M\Z) > n, is in the y/2e- 
neighborhood of the actual pair (M,Z). Additionally, by the 
data processing inequality 0, H^(M\M') > H^(M\Y) 
since the decoder generates the guess M' from Y. 

Defining Pm to be the uniform distribution, we have 

max \h^(M\Z) - h£°(M\Y) 

r^M ,1V1 — VJi. . 

'h^(m\z) Pm 
: {m\z) Pm - 



> max 

M-yX 

> max 
M-yX 

> n, 



H V2i 
mm 



H V2i 
max 

max 



\m\y) Pm 
\m\m') p 



which is the form we set out to prove. 

For the converse of the classical capacity, observe that the 
encoding function Enc is without loss of generality determin- 
istic and injective. It might as well be deterministic, since if it 
used randomness, we could make it deterministic by fixing 
the randomness to that value with the least probability of 
error, which cannot be worse than the average case. Moreover, 
for this deterministic choice, Enc must be injective, since a 
collision of two inputs having the same codeword necessarily 
implies an error. Now, using the injectivity of Enc we can 
define a distribution Px given a distribution over M by 
simply taking the distribution of M on its image in X and 
zero otherwise. Choosing the uniform distribution over M 
and observing that H min (M) — n when M is uniformly 
distributed, we obtain 



max 

Px 



H min (X) - H^(X\Y) 



> 



H min( X )p x - H m l^{X\M')p 



H min (M) - H^(M\M' 



> n. 



IV. Asymptotic Analysis 



(11) 

(12) 

(13) 
(14) 



In the asymptotic limit of n — > oo uses of a memoryless 
channel we recover the known results on the rate of public or 



private communication, where the rate of private communica- 
tion is defined by 



c*„(e® n ) 



-Rprv(0) = lim lim 

e— >0 n— >oo n 



(15) 



and the rate of public communication is defined similarly. In 
general, the rates take the rather ungainly form 

i? P ub(e) = Hm j max [H(X e ) - H(X e \Y® e )] , (16) 



i? prv (6) = lim - max [H(T\Z® 1 ) - H(T\Y® l j\ . 

(17) 



Here X n refers to a classical random variable on X n , while 
Y® n refers to the n-fold tensor product of the Hilbert space 
and similarly for Z. The rate for public communication 
over quantum channels is known as the HSW theorem, after 
Holevo ifTTl and Schumacher and Westmoreland [12|. The 
private rate was proven by Devetak ff3l . 

For the special case of classical channel outputs, i.e. d\ 
(and separately are all simultaneously diagonalizeable, 
these reduce to the familiar and simpler form 



i? P ub(e cl ) = max [H(X) - H(X\Y)} 



Px 



i? P rv(e cl )= max [H(T\Z)-H(T\Y)}. 



P T ,T^X 



(18) 
(19) 



The classical rate is Shannon's original noisy channel coding 
theorem [j7j. The private rate was first established by Wyner 
in the specific setting of the wire-tap channel [8], later ex- 
panded to arbitrary channels by Ahlswede and Csiszar (9), 
and strengthened to the stronger form of security used here 
(cf. Eq. |2) by Maurer and Wolf iflOt 

The proof proceeds in two steps. First we show that such 
rates are possible using the lower bound on the capacity and 
applying the asymptotic equipartition property (AEP) of the 
conditional min- and max-entropies. Then we show that the 
rates cannot be exceeded by making use of the upper bound 
on the capacity and bounds on the conditional min- and max- 
entropy in terms of the conditional von Neumann entropy. 
Since the case of public communication follows from that 
of private communication, we only give the argument for the 
latter. 

For the direct part, we begin with the lower bound on the 
capacity from Theorem [T] For m uses of the channel 8, this 
becomes 



Cv(© 0m ) 



> max 

P T ,T^fX r - 



H e '*(T\Z® m )-H^(T\Y®™) 



^log 1 - 16 



(20) 



Since this is a lower bound, we're free to choose T as we like. 
We choose T = T m to be i.i.d., each instance T, separately 
generating the likewise i.i.d. Xi via some fixed map T X. 
Now we make use of the AEP, which states J6) 



lim lim H^ in (X m \Z® m ) = mH(X\Z), 



(21) 
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and similarly for the conditional max-entropy. We then obtain 
1 



lim lim 

e— >0 m— ¥00 m 



) > max \H(T\Z) 

P T ,T^X 



H(T\Y)] 



(22) 

Finally, we let n — im, and replay the above argument using 
the superchannel Q® £ (£ independent uses of the channel) to 
obtain the desired result. 

Note that, in contrast to the standard proof technique, 
here we only need to make a statement about the entropies 
in the capacity formula, a statement provided by the AEP. 
Importantly, typical sequences, type classes, or the like play 
no role in the protocol itself, but could be used to establish the 
AEP. Such methods are not necessary; indeed, the approach 
of |6) is based on properties of Renyi entropies. 

To complete the argument, we consider the upper bound 
on the capacity, and use the bounds on the conditional min- 
and max-entropies from Lemma|5] replacing dim(A) with \X\. 
Now the upper bound on the private capacity becomes 



C prv (0 



) < max 

P T ,T->-X r> 



H(T\Z 



H{T\Y® n ) 



+ 16raV2elog|A?| +4/^(2 



2e) . 
(23) 



Dividing through by n and taking the limits n — > oo and 
e — » (whose order is now irrelevant) yields the desired result 
(replacing n by £). 

When the channel has purely classical outputs, the limit 
involving I — > oo (called regularization) can be removed. For 
private communication we show this explicitly in Lemma [6] 
which recovers Eq. [18] for public communication (Eq. [19| 
see, e.g. Theorem 4.2.1 in [19) • Should the channel produce 
quantum outputs, regularization is known to be necessary in 
both cases, private [22| and public l23l . 

Finally, we note that the optimization over maps T — >• X 
is generally necessary to achieve the optimal rate of private 
communication, by means of the following example. Suppose 
is a purely classical channel defined in Fig. [2] To send 
private messages to Y, clearly one can encode as X = 
or X = 1 randomly and 1 as X = 2. The message can be 
unambiguously determined from Y no matter the encoding, 
but Z will be completely random for either input, and so 
this encoding scheme achieves a rate of 1 bit. Eschewing 
random encoding, the maximum rate of private communication 
is max Px [H(X\Z) - H(X\Y)]. Due to the structure of the 
X — > Z map, the maximum of the first term is one, and this 
can only occur when and 1 occur with equal probability. But 
this implies the second term is nonzero, meaning the overall 
rate is less than one. 

V. Conclusions 

Rather than reusing the proof techniques involved in un- 
derstanding more basic information processing protocols such 
as privacy amplification and information reconciliation, here 
we have shown how to construct channel coding protocols 
from these protocols themselves. Moreover, if the underlying 
protocols are optimal, then so are the channel coding proto- 
cols. This provides an appealing conceptual framework for 




Fig. 2. Channel demonstrating the need for randomness by the encoder 
to achieve the private communication capacity. Unmarked arrows denote 
deterministic maps; otherwise the probability of a transition is marked. 



two-terminal problems in information theory in which one 
successively builds up to more complicated protocols using 
simpler elements whose internal workings are not relevant for 
the present task. Moreover, it is also appealing to see that 
the two basic primitives are characterized in terms of the two 
basic entropic quantities, smooth min- and max-entropy, and 
that these quantities enter the capacity expressions in a way 
which reflects the protocol construction. 

In the setting of quantum information theory these entropies 
are dual 0, as are the two primitives [24|, meaning only one 
primitive is needed to construct more and more complicated 
protocols. As an example, instead of appealing to Theorem [3] 
for the compressor/decompressor pair needed to establish Eq.|9] 
in Theorem [T] we may rely on 

1) Theorem [2j 

2) the duality of privacy amplification and information 
reconciliation as shown in 1241 . and 

3) a new form of the uncertainty principle derived in ll25l . 
Specifically, in the proof of Theorem [T] we require a lin- 
ear compressor/decompressor pair operating on the classical- 
quantum state of U'Y, classical in say the U' basis (in an 
abuse of notation). By the duality in [24], such a pair with 
error probabilty \[2e can be constructed from e-good linear 
privacy amplification of the conjugate basis U' |^| and the size 
of the compressed output C in the optimal case is given by 
log 2 \C\ = log 2 \U\ - H^ n {U'\R) + 21og i - 1, where R is 
the purification of the original system XY and e — E\ + e 2 - 
But from the uncertainty principle of l25l we have 



(24) 



H^ in (U'\R) + H^ X (U'\Y) >log\U\. 

Therefore log 2 \C\ < H^{U'\Y) + 0(log \), and we recover 
Eq. [9] up to O(logj) terms|^Thus we have constructed all- 
decoupling proofs of the public and private capacities of a 
quantum channel, in the sense that establishing the capacity 
now does not rely on directly constructing a decoder for the 
receiver, as in the proof of Theorem [3] but rather on deco upling 
the purifying system, as in the proof of Theorem 2 5 In the 



3 The cq nature of the state ensures that we satisfy condition (b) of Theorem 
4 in |24| . 

also cannot be substantially smaller, by the lower bound of 

Band the capacity as calculated here is certainly no smaller than 
(up to 0(log 1) terms). 



4 log 2 \C 
Theorem [3 
that of Eq. 



5 Observe that we could not have used a similar form of the uncertainty 
principle derived in 1241 for the present purpose, as it subtly relies on the 
decoder construction we are trying to avoid. 
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asymptotic limit of many independent uses of the channel, 
we then recover the familiar HSW ifTTl . lfl2ll and private 
capacity [13] results. This derivation of the classical capacity 
of a quantum channel can thus be seen as the classical- 
quantum analogue of l26l . where the quantum capacity of a 
quantum channel is derived using a decoupling approach. As 
with the main proof of Theorem [T] the decoupling procedure 
outlined above also suffices to derive Shannon's result on 
the public capacity of classical channels Q, as well as the 
associated privacy capacity results (8), (9), iflOl . simply by 
treating the classical channel in the formalism of quantum 
information theory. 
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Appendix 

The conditional max-entropy for a state p AB is defined by 



H max {A\B) p = max 21ogF(/ B , 1 A 



(25) 



where the maximization is over positive, normalized states a 
and F(p,cr) = \\^/py/o : \\i is the fidelity of p and a. Dual to 
the conditional max-entropy is the conditional min-entropy, 



H min (A\B) p = max(-logA min (p AS ,(7 S )) , 



(26) 



with A mln (p AB , a B )=mm {\ : p AB < \1 A <g) a B ). The two 
are dual in the sense that 

H max (A\B) p = -H min (A\C) p (27) 

for p ABC a pure state ll27l . The min- and max-entropies derive 
their names in part from the following relation (Lemma 2, |6|), 

H min (A\B) P < H(A\B) P < H max (A\B) p , (28) 

where H{A\B) p = H(AB) p - H(B) p for H(B) p = 
— Tr[plogp] is the usual conditional von Neumann entropy. 

The min- and max-entropies can be smoothed by consider- 
ing possibly subnormalized states p AB in the e-neighborhood 
of p AB , defined using the purification distance P(p, cr) = 



B £ (p) = {p:P(p,p)<e}. 



(29) 



Note that the purification distance is essentially equivalent to 
the trace distance, due to the bounds D(p,o-) < P(p,a) < 
y/2D(p,a) [5|. The smoothed entropies are then given by 

H^ in (A\B) P = max H mi JA\B) p , (30) 

peB E ( P AB ) 

H^(A\B) p = min H max (A\B) p . (31) 

peB E ( P AB ) 

Furthermore, the dual of H^ ax (A\B) p is H^ in (A\C) p , so that 
taking the dual and smoothing can be performed in either 
order (5|. 

Optimal one-shot privacy amplification results are estab- 
lished in ID, J28), |39], ED. Using the entropy definitions 



above, the number of e-good random bits i £ ext (X\E) which 
can be extracted from the classical random variable X against 
a possibly quantum eavesdropper is bounded by 



XE 



Theorem 2 (Privacy Amplification). Given a state ip 
J2 x Px\ x )( x \ X ® Vx an d £ i> £ 2 > such that e — 81+62, 



H% a (X\E)i,-21o e ± + l< 



'ext(^l £ ')V' ^ H min 



(X\E) 



Meanwhile, optimal one-shot information reconciliation 
results are given in OTI . The minimum number of bits 
£^ mp (X\B) to which the classical random variable X can be 
compressed and still be recovered using side information B at 
the decoder with error probability less than e is bounded by 

Theorem 3 (Classical-Quantum Information Reconciliation). 

Given a state tp XB = ^ x p x \x)(x\ x (8) <p B and £i,ea > 
such that e = £i+£2 

H^{X\B)^ < £t mp (X\B)^ < H^ ax (X\B)^ +21og i + 4. 

Lemma 3 (Preimage Sizes of Linear Functions). Let f : X —> 
y be a linear function. Then = 1^1/13^1 for all y € 

y. 

Proof: Pick an x* and consider all the Xj E X such that 
f(xj) = f(x*). Forming the differences Wj = Xj — x* , it 
follows from linearity that f(wj) — for all j. Now consider 
an arbitrary x' £ X. Clearly f(x' + Wj) = f(x') for all j, so 
each output value has the same number of preimages. ■ 

Lemma 4 (Preimage Sizes of Arbitrary Functions). Let f : 

X — > y be an arbitrary function and denote by X y the 
preimage of an output y. For a randomly-chosen output value 
y, \X y \ > e\X\/\y\ with probability at least 1 — e. 

Proof: Let X be a uniform random variable over X and 
Y = f(X). By the min-entropy chain rule (Lemma 3.1.10 
in 1 4 1) we have 



H min (X\Y) > H n 
= H„ 



1 (Xy)-l0g|supp(Py)| 

AX)-\og\saw>(Py)\>log\X\/\y\. 

(32) 



Here |supp(Py)| is the size of the support of the distribution 
Py, the number of values taking nonzero probability. By the 



normalization condition for P 



max, P 



x\y-. 



-_ y it follows that l/\X y \ 



< 



X\Y= 



y (x). And by the definition of H lnin (X\Y). 

-H„ ln (x\Y) < \y\ 
- \xy 



yey 1 v 



< 2 



(33) 



Finally, applying the Markov inequality to the random variable 
\Xy \ yields 



Pr 



1 



> 



\y\ 



\X V \ - e\X\ 



< 



e\X\ 

\y\ 



yey 



1 

\X„ 



(34) 



which concludes the proof. 

Lemma 5 (Smooth Entropy Bounds). 

H^ a (A\B) p < H(A\B) P + 8 £ logdim(A) 
H^ x {A\B) p > H{A\B) p - 8 £ logdim(A) 



2h 2 (2e) (35) 
2h 2 (2e), 

(36) 
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where h%{x) = —xlog 2 x — (1 
entropy function. 



x) log 2 (l — x) is the binary 



Proof: Let p be a state in B £ (p) such that H n 
H^ in (A\B) p . Then from Eq. |t] we have ^ lin (A| J B) p < 
i?(j4|S)p. Since the purification distance bounds the trace 
distance, D(p, p) < e and we can use the continuity of 
the conditional von Neumann entropy ll32ll to establish that 
H(A\B)p < H(A\B) p + 8e\ogdim(A) + 2h 2 {2e), completing 
the proof for the min-entropy. An entirely similar argument 
holds for the max-entropy. ■ 

Lemma 6 (Single-Letter Formula for the Private Capacity). 
For a channel O c i with purely classical outputs Y and Z, 



R 



P rv(e cl ) < max [H(T\Z) - H(T\Y)] . (37) 



Start with_the expression b = H(T\Z® 1 ) - 
By Lemma 4.1 of (9), or direct 



17 



Proof: 
H{T\Y® 1 ) from Eq. 
calculation, we can rewrite this as a sum 

i t 



(38) 



for Vi = Z\ . . . Zi-iYi+x . . . Yg. Observe that the random 
variables involved in hi form the Markov chain (T, Vi) -H> 
X t <-> (Yi,Zi). This follows because (X h X' ,Y' , Z') <-> 
Xi O (Yi,Zi) is a Markov chain, where X' denotes the 
tuple of Xj random variables omitting Xi, and (T, Vi) can 
be computed from (X t , X' , Y', Z'). 

But now we can maximize each term over (T, Vi) subject 
to the Markov chain condition and obtain the single-letter 
formula 



1 — 1 ' 1 

< m'Ax£\H(T\VZ) - 

{TV) 



) - H(T\ViY) 
H(T\VY)] , 



for the Markov chain (T, V) «-> X o (Y, Z). Since Shannon 
entropies conditioned on V are averages of entropies condi- 
tioned on specific values V — v, this implies 

b < max I [H(T\Z, V = v) - H(T\Y, V = v)} . 

(T,V,v) 

Finally, conditioning on V = v preserves the Markov chain 
since i"Vz|_x — Pyz\xtv implies 
. And because each choice of V 



the maxi- 



Pyz\x.y=v — Pyz\xt.v . 
and v induces a conditional distribution Pt\v- 
mization need only be taken over T. Using this in Eq. [17 
completes the proof. ■ 
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